issue/194: Support Quantization Config and Quanted Model Inference#195
issue/194: Support Quantization Config and Quanted Model Inference#195qinyiqun wants to merge 10 commits intoInfiniTensor:mainfrom
Conversation
qinyiqun
commented
Jan 21, 2026
- 为linear类增加量化选项
- 引入nlohmann json库
- 增加quantization和global config两个大类,以支持多种advanced feature config。
csrc/engine/rank_worker.cpp
Outdated
|
|
||
| // Create model using factory (may be expensive) | ||
| model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr); | ||
| model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr, global_config_); |
There was a problem hiding this comment.
为什么又有model config又有global config?
There was a problem hiding this comment.
model config 是原有的llama_config,global config现在只负责advanced feature
There was a problem hiding this comment.
用通用的json吧原有llama config替换掉
| } | ||
|
|
||
| infinicore::nn::Parameter QKVParallelLinear::get_q_weight_scale() const { | ||
| return infinicore::nn::Parameter( |
There was a problem hiding this comment.
感觉不需要,类似于bias,bias是由一个has_bias控制,这个get_xx_scale()是在宏里使用的,而宏在代码里面我写的是跟量化方法绑定的,即使返回optional,在宏里也要解optional。
85c2485 to
9dee06a
Compare
csrc/config/global_config.hpp
Outdated
| #include <fstream> | ||
| #include <string> | ||
|
|
||
| namespace infinilm::config::global_config { |
There was a problem hiding this comment.
应该不需要额外的global_config这个空间
csrc/config/global_config.hpp
Outdated
| #include <string> | ||
|
|
||
| namespace infinilm::config::global_config { | ||
| struct GlobalConfig { |
There was a problem hiding this comment.
直接用class吧。另外建议改名为ModelConfig之类的更直观的名字
There was a problem hiding this comment.
我本来想的是可以把distributed config和kv cache config都包进来,所以叫global_config
csrc/config/quant_config.hpp
Outdated
| #include "../quantization/quantization.hpp" | ||
| #include "nlohmann/json.hpp" | ||
|
|
||
| namespace infinilm::config::quantization { |
There was a problem hiding this comment.
同样不需要quantization这个space。这些config应该不会有重名的情况
| #include "nlohmann/json.hpp" | ||
|
|
||
| namespace infinilm::quantization { | ||
| class BaseQuantization { |
There was a problem hiding this comment.
这层封装的意义是什么,看着好像只是传了个quant scheme,但这个功能不是QuantConfig就能做吗
There was a problem hiding this comment.
现在传config是因为逻辑太少了,为之后开发预留的类,现在有一个需求是模型级别的量化,需要在量化方法之上进行一个封装
| // ========================= QKV Quantization ================================== | ||
| #define INFINILM_QKV_LINEAR_W8A8_INIT(name, q_name, k_name, v_name, ...) \ | ||
| name##_ = std::make_shared<layers::QKVParallelLinear>(__VA_ARGS__); \ | ||
| /* 注册 Q 权重 */ \ |
| std::shared_ptr<InfinilmModel> model; | ||
| if (const auto llama_config_ptr = dynamic_cast<const models::llama::LlamaConfig *>(&config)) { | ||
| const auto &llama_config = *llama_config_ptr; | ||
| //****************************NEED TO BE FIXED */ |
| //------------------------------------------------------ | ||
| InferEngine::InferEngine( | ||
| const InfinilmModel::Config &config, | ||
| const distributed::DistConfig &distributed_config, |
There was a problem hiding this comment.
为什么会涉及这个接口的修改,而且还做了参数顺序调整,这种基础接口的修改属于高风险修改,确定不会有些地方炸掉?